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Abstract: Large areas assessments of forest biomass distribution are a 
challenge in heterogeneous landscapes, where variations in tree growth 
and species composition occur over short distances. In this study, we use 
statistical and geospatial modeling on densely sampled forest biomass 
data to analyze the relative importance of ecological and physiographic 
variables as determinants of spatial variation of forest biomass in the 
environmentally heterogeneous region of the Big Sur, California. We 
estimated biomass in 280 forest plots (one plot per 2.85 km ) and meas¬ 
ured an array of ecological (vegetation community type, distance to edge, 
amount of surrounding non-forest vegetation, soil properties, fire history) 
and physiographic drivers (elevation, potential soil moisture and solar 
radiation, proximity to the coast) of tree growth at each plot location. Our 
geostatistical analyses revealed that biomass distribution is spatially 
structured and autocorrelated up to 3.1 km. Regression tree (RT) models 
showed that both physiographic and ecological factors influenced bio¬ 
mass distribution. Across randomly selected sample densities (sample 
size 112 to 280), ecological effects of vegetation community type and 
distance to forest edge, and physiographic effects of elevation, potential 
soil moisture and solar radiation were the most consistent predictors of 
biomass. Topographic moisture index and potential solar radiation had a 
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positive effect on biomass, indicating the importance of topographically- 
mediated energy and moisture on plant growth and biomass accumula¬ 
tion. RT model explained 35% of the variation in biomass and spatially 
autocorrelated variation were retained in regession residuals. Regression 
kriging model, developed from RT combined with kriging of regression 
residuals, was used to map biomass across the Big Sur. This study dem¬ 
onstrates how statistical and geospatial modeling can be used to dis¬ 
criminate the relative importance of physiographic and ecologic effects 
on forest biomass and develop spatial models to predict and map biomass 
distribution across a heterogeneous landscape. 

Keywords: forest biomass; landscape heterogeneity; spatial variation; 
semivariogram; regression tree; regression kriging; Big Sur California 

Introduction 

Forest biomass holds more carbon than any other terrestrial bi- 
ome on Earth (Houghton 1999). Scientists have recently recog¬ 
nized that the spatial distribution of forest biomass is critical to 
measuring trends in forest carbon stocks through time (Hu and 
Wang 2008), predicting the global carbon cycle (Houghton 
2005), and mapping the risk of wildfire (Mickler et al. 2002; He 
et al. 2004). Although our knowledge of how deforestation pat¬ 
terns influence large-area estimates of carbon storage has in¬ 
creased, especially in the tropics (e.g. Defries 2002), assessment 
of forest biomass distribution in spatially heterogeneous land¬ 
scapes continues to present challenges especially in regions with 
insufficient density of forest inventory plots. Across highly het¬ 
erogeneous landscapes, spatial variation in forest biomass may 
be especially complex where variations in plant growth and spe¬ 
cies composition occur over short distances due to sharp interac¬ 
tions between physiographic (e.g., elevation and solar radiation), 
ecological (e.g., forest edge effects and fire history), and human- 
mediated (e.g., past logging and other land-use changes) factors 
(Anderson et al 2009; Clark and Clark 2000; Poulos 2009). As 
environments become increasingly complex, approaches that 
examine landscape heterogeneity and its influence on forest bio¬ 
mass distribution are greatly needed to accurately assess trends 
in the global carbon balance (Houghton 2005). 
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Regional scale assessments of forest biomass distribution have 
mostly relied on geographically extensive forest inventory data. 
In the United States, for example, the Forest Inventory and Anal¬ 
ysis program (FIA; Smith 2002) provides the most practical data 
for estimating forest biomass at regional to national scales 
(Brown et al. 1999; Blackard et al 2008; Hu and Wang 2008). 
However, with an average density of one plot per ~ 2400 ha, FIA 
data may be too sparsely distributed for many landscape-scale 
studies of biomass variation. In a recent study, Freeman and 
Moisen (2007) observed highly erratic patterns of spatial varia¬ 
tion in biomass distribution derived from FIA data for regions in 
the western North America. To counter such problem, models of 
biomass distribution often aggregate biomass estimates to 
coarser regions, such as political boundaries (e.g., Meng et al. 
2007). Alternatively, biomass distribution models can be based 
on correlations with key environmental variables that may be 
applied in a geographic information system (GIS) to map spa- 
tially-explicit patterns of biomass (Bacinni et al. 2004; Blackard 
et al. 2008). As spatial data layers of environmental and ecologi¬ 
cal variables are becoming increasingly available at finer spatial 
resolution and across larger spatial extents, correlation-based 
models offer prospects for developing finer resolution estimates 
and maps of biomass at regional to national extents. 

The processes that underly plant growth and biomass accumu¬ 
lation in heterogeneous landscapes are influenced by a number of 
ecological and physiographic factors that collectively influence 
the flow of energy, moisture, and nutrients available to vegeta¬ 
tion communities (Chen et al. 1999; Lovett et al. 2005; Saatchi et 
al. 2009). For example, physiographic influences of topography 
on variations in climate, solar radiation, and soil moisture have 
been increasingly incorporated into models of regional plant 
biomass distribution (e.g. Bacinni et al. 2004). However, eco¬ 
logical drivers of biomass that vary at finer spatial scales have 
received less attention in models of biomass distribution (Van- 
walleghem and Meentemeyer 2009). Ecosystem disturbances 
caused by wildfire, for instance, can cause abrupt transitions in 
landscape patterns of vegetation, which may in turn mediate 
biomass regrowth and accumulation (Ohmann et al. 2007). Over¬ 
all, ecological and physiographic drivers may have complex 
interactive effects that should be evaluated to assess biomass 
spatial variation and develop biomass prediction models. 

In this study, we use statistical and geospatial modeling to ex¬ 
amine spatial variation in above-ground forest biomass distribu¬ 
tion in an environmentally heterogeneous landscape. Using the 
Big Sur ecoregion (-80,000 ha) in coastal California as a case 
study of a spatially heterogeneous region, we examine the hy¬ 
pothesis that biomass distribution is spatially autocorrelated and 
their spatial patterns are related to landscape-scale variations in 
ecological and physiographic factors. 

We estimate forest biomass in 280 field plots from redwood- 
tanoak and mixed evergreen forest that span across a range of 
environmental conditions. The density of these data provides a 
unique assessment of biomass spatial variation which is not fea¬ 
sible with sparser regional scale data. We assess spatial depend¬ 
ence and local uncertainty in biomass distribution using 
semivariogram analyses, and develop regression models to pre- 
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diet the effects of ecological and physiographic factors on bio¬ 
mass distribution. We developed geospatial models of biomass 
distribution based on biomass spatial autocorrelation and its rela¬ 
tionships with ecological and physiographic landscape variables. 
Finally, using random sample selection, we assess the role that 
the spatial distribution and density of field sampling plays on the 
performance of biomass models in heterogeneous landscapes. 

Materials and methods 

Study system 

The Big Sur ecoregion (79356 ha) extends along the western 
slope of the Santa Lucia Range, from Point Lobos south to 
Salmon Creek, CA (Fig 1). The region’s adjacency to pacific 
ocean provides a mediterranian-type climatic condition with 
warm dry summers and mild wet winters. The topography is 
highly dissected by steep slopes and drainages with elevations 
ranging from sea level to 1571 m. This environmentally complex 
region supports a diversity of plant communities (Henson and 
Usner 1996). Upper elevation slopes and rocky ridges tend to 
support mixed coniferous forests composed of ponderosa pine 
(Pinus ponderosa ), sugar pine (P. lambertiana), jeffrey pine (P. 
jeffreyii ), coulter pine (P. coulteri ) and Santa Lucia fir ( Abies 
bracteata). Drier south-facing slopes and ridges at mid eleva¬ 
tions are often dominated by chaparral shrubland and annual 
grasslands. Mixed evergreen forests, consisting of coast live oak 
( Quercus agrifolia ), Shreve’s oak ( Q . parvula var. shrevei ), bay 
laurel ( Umbellularia californica), and madrone ( Arbutus menzie- 
sii ), typically occur on moister slopes which transition to riparian 
corridors of redwood (Sequoia sempervirens ) - tanoak ( Lithocar- 
pus densiflorus ) forest at lower elevations. Low elevation southl¬ 
and west-facing slopes support drought deciduous coastal sage 
scrub vegetation (Borchert et al. 2004). Spatial structure of the 
vegetation communities in the Big Sur is thought to be largely 
determined by disturbance and physical environment (Davis et al. 
2010). Fire is the major ecosystem disturbance in the Big Sur and 
its interactive effects with physical environment has influenced 
the direction and rate of changes in vegetation communities (Cal¬ 
laway and Davis 1993). Further, the rugged topography of the 
region mediates drought severity which influences the patterns of 
seedling recruitment type, and species that depend on fires for 
recruitment are favored by increasing drought severity (Meente¬ 
meyer and Moody 2002). Proximity to the Pacific Ocean modi¬ 
fies temperature year round providing a mild climate with mini¬ 
mal variations. Based on Parameter-elevation Regressions on 
Independent Slopes Model (PRISM, Daly et al. 2001), the region 
has a long term average annual precipitation of 40.5-145.0 cm, 
minimum temperature of 5.5-9.6 °C and maximum temperature 
of 16.9-21.3 °C. 

Field plot data 

Over the summers of 2006 and 2007, we established 280 ran¬ 
domly distributed field plots (500 m 2 ) in mixed evergreen (n = 
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162) and redwood-tanoak (n = 118) forest communities (Fig. 1). 
We recorded the plot center using GPS (global positioning sys¬ 
tem) receivers with a horizontal accuracy within 1 m using dif¬ 
ferential correction (Trimble Navigation Limited, Sunnyvale, CA) 
and mapped the location of every live and dead stem > 1 cm 
DBH (diameter at breast height, 1.3 m) relative to the plot center. 
For each tree, we identified the species and measured its DBH. 
We estimated the above-ground biomass of individual live trees 
greater than 2.5 cm DBH using allometric equations summarized 
by Jenkins et al (2003). The DBH measurements were taken 
from 8 812 trees and DBH distribution had a mean of 18 cm, 
minimum 2.5 cm, maximum 243.7 cm, and standard deviation 
24.4 cm. The estimated biomass of all trees within a plot were 
summed to represent plot biomass (expressed in Mg/500 m 2 ). 
The distance between a plot and its closest neighbor ranged from 
164 m to 2 990 m with 650 m as the average distance between a 
plot and its closest neighbor. 



Fig 1. Study system showing vegetation types across the Big Sur 
ecoregion and plot locations. The inset shows a detail about spatial 
heterogeneity in vegetation communities and spatial sampling design. 


Ecological and physiographic variables 

Within a GIS, we assembled raster maps of ecological and 
physiographic variables (Table 1) that we hypothesized to influ¬ 
ence plant growth and biomass accumulation in the study region. 
The spatial distribution of mixed evergreen, redwood-tanoak 
forest, montane conifer and other forested vegetation types was 
mapped by Meentemeyer et al. (2008) using a combination of 
high resolution imagery and field data (Fig 1). We quantitfied the 
heterogeneity in vegetation types surrounding each plot as the 
proportion of forest and non-forest vegetation type at varying 
radii (50, 100, and 200 m) and measured the distance of each plot 
to the nearest forest edge. To assess landscape scale variation in 
soil properties, we used maps of soil orders and sub-orders from 
the Soil Survey Geographic (SSURGO) database 
( http://soildatamart.nrcs.usda.gov accessed March 2009). We 
identified the number of fire events since 1900, 1950 and 1980 
using intraagency fire history data from co-operative efforts of 
the California Department of Forestry and Fire Protection, 
United States Department of Agriculture Forest Service Region 5, 
Bureau of Land Management, and National Park Service, and 
distributed by the United States Forest Service 
( http://www.fs.fed.us/r5/rsl/clearinghouse/ ). 

To assess physiographic variations, we mapped elevation, 
slope, topographic moisture index (TMI), and potential solar 
radiation (PSR) for the months of March, June and December 
using a 30 m digital elevation model (DEM) (Table 1). TMI 
characterizes topographic redistribution of soil moisture and is 
computed as natural log of the ratio between upslope drainage 
area and local slope gradient (Moore et al. 1991). PSR character¬ 
izes topographic variations in potential incoming solar radiation 
using cosine of illumination on slope equation (Dubayah 1994). 
Climatic variation was mapped using 30-year average annual 
precipitation, and minimum, maximum, and mean temperature. 
We also mapped latitude and proximity to the Pacific Ocean as 
indirect gradients of geographical and physiographical effects on 
climate. Values for each of the nine variables were identified at 
all 280 plot locations. 


Table 1. Ecological and physiographic variables assembled for the Big Sur region 


Variables 

Vegetation community type - Mixed Evergreen (ME) and Redwood-tanoak (RWTO) 
Distance to forest (DistFor) and non-forest (DistNonFor) in meter 
Soil order (Ord) and sub-order (Sub Ord) 

Number of fire outbreaks (N) 

Elevation in meter (E), slope in degree (S), topographic moisture index (TMI), and 
potential solar radiation index (PSR) for the month of March (PSRMar), June 
(PSR June) and December (PSR Dec) 

Temperature in °C (T) and precipitation in meter (P) 

Distance to coast in meter (Dist Coast) 

Latitude in meter (X) 


Data source 

Vegetation map by Meentemeyer et al (2008) 

Vegetation map by Meentemeyer et al (2008) 

Soil Survey Staff, Natural Resources Conservation Service, United States 
Department of Agriculture 

California Department of Forestry and Fire protection, United States Forest 
Service, Bureau of Land Management, and National Park Service. 

United States Geological Survey, National Elevation Dataset (Gesch, D.B., 
2007). 

PRISM (Daly et al. 2001) 

County boundary map, California Department of Conservation 
Study area boundary 
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Spatial autocorrelation and local uncertainty 


We assessed spatial dependence in biomass distribution between 
field plots using semivariance analyses (Webster and Oliver 
2007). Our plot data were normalized for semivariance analyses 
using natural log transformation. Semivariance is a measure of 
dissimilarity of observations across space and was computed 
from our sample data as: 




{z (x,) - z(x, + h)} 2 



where, n is the number of pairs of sample observations separated 
by a distance h, and z(x 7 ), z(x, + h) are sample measurements 

/v 

separated by a distance h. The plot of J (h) against h is known 

as a semivariogram. We used the semivariogram to summarize 
spatial variation as a function of lag separation using three pa¬ 
rameters - nugget, sill and range. The relative nugget effect, 
computed as the ratio of nugget to sill, was used an index of 
spatially correlated variation. 

We modeled the local uncertainty in biomass distribution us¬ 
ing a conditional cumulative distribution function (ccdf), which 
summarizes probability that the value of a variable is below a 
certain value, conditioned to the observed values of the variable. 
This non-parametric approach to assessing local uncertainty of a 
spatially distributed variable transforms the observed values into 
a vector of K indicators consisting of 0 and 1 using the transfor¬ 
mation function / (x; Z ) (Goovaerts 1997). 


/(x;z c ) 


1, z(x) < z c 
0 otherwise 


c = 1,. ,K | (n) 



where, Z c is a threshold value, K is a user-specified number of 

cutoff thresholds, and |(n) indicates conditioning to local infor¬ 
mation, i.e., data in neighboring locations. We quantified spatial 
structure of local uncertainty using semivariograms of indicator 
transforms, often called indicator semivariograms. In order to 
model local uncertainty in biomass distribution, we transformed 
biomass data into 1 and 0 based on seven threshold values from 
its histogram (second to eighth deciles, which are 8.59, 10.03, 
13.19, 15.46, 19.53, 22.93, and 29.18) and computed the indica¬ 
tor semivariogram. We excluded the first and ninth deciles be¬ 
cause indicator semivariograms of extreme thresholds are not 
well defined. 

Biomass prediction models 

Spatial autocorrelation in biomass distribution and its relation¬ 
ship with ecological and physiographic variables allowed us to 
develop two types of biomass prediction models: (1) geostatisti- 
cal prediction model that rely on spatial autocorrelation, popu- 
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larly known as kriging (Webster and Oliver 2007), and (2) envi¬ 
ronmental correlation model (McKenzie and Ryan 1999) based 
on the predictive relationship of biomass with ecological and 
physiographic variables. To take advantage of both prediction 
methods, we developed a hybrid prediction model, popularly 
known as Regression Kriging (RK), which combines determinis¬ 
tic spatial variation described by a regression model with geosta- 
tistical prediction of regression residuals using kriging (Hengl et 
al. 2004; Vanwalleghem and Meentemeyer 2009). RK assumes 

that prediction of a variable Z (x ? ) at an unvisited location (xi) 

is an additive function of variables describing spatial and envi¬ 
ronmental variation: 


Z(x i ) = m(x i ) + £(x i ) + £' 

where, m(x 7 ) is a structural component that can have a constant 
mean or exhibit a trend, £*(x ? ) is the random but spatially au- 

tocorrelated variation, and is the spatially uncorrelated resid¬ 
ual error term. 

To model deterministic spatial variation, we explored predic¬ 
tive association of biomass with ecological and physiographic 
variables using regression tree (RT) analyses. RT is a non- 
parametric tree-based regression approach that follows binary 
recursive partitioning whereby tree models are constructed by 
repeated splitting of the set of observations (parents nodes) into 
two descendent subsets (child nodes) such that data within child 
nodes are relatively homogeneous (Breiman et al. 1984). It is 
ideally suited for the analysis of complex ecological data which 
requires flexible and robust analytical methods to deal with 
nonlinear relationships and higher order interactions (DeAth and 
Fabricius 2000). We built several tree models of biomass varia¬ 
tion using 10-fold cross validation (Breiman et al 1998) with v = 
10 in our case, and the model that explained the highest amount 
of variation during validation (cross validation R 2 ) was selected 
as the optimal prediction model. In v-fold cross validation, the 
data are divided into v subsets of approximately equal size, the 
model is trained v times leaving out one of the subsets from 
training and using this omitted set for model validation. To 
model autocorrelated spatial variation, we assessed spatial auto¬ 
correlation in RT prediction residuals using semivariogram, and 
mapped regression residuals using ordinary kriging (OK). RT 
and OK differ in the sense that RT is an aspatial approach based 
on the predictive associations of biomass with environmental 
covariates, while OK is a univariate spatial approach that relies 
on the model of spatial autocorrelation structure and observed 
data to derive spatial interpolations. Finally, RT model was com¬ 
bined with OK of regression residuals to develop RK model. 

To assess model prediction quality, we randomly split the data 
into model (75% data, N: 210) and validation set (25%, N: 70); 
prediction models were developed using model set and their 
predictions were compared at validation locations. We assessed 
overall model fit using coefficient of determination (R ) and 
model prediction quality using the Root Mean Squared Error 
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(RMSE). The R 2 indicates an overall agreement between pre¬ 
dicted and true values, while RMSE indicates overall prediction 
quality by comparing predicted and true values. 

Effect of sample size 

Accurate predictions of spatially autocorrelated variables using 
RK depends on the performance of environmental correlation 
models, which largely depends on sample size. To assess the 
effects of sample size and selected samples on environmental 
correlation models, we developed RT models using varying 
sample sizes developed from two sampling schemes. First, we 
developed models from randomly selected 80%, 70%, 60%, 50%, 
40%, and 30% of the data, and compared model fit using R 2 . 
Second, we randomly selected and set aside 20% of the data as 
independent validation data (N: 56) and developed models from 
the remaining 80%, 70%, 60%, 50%, 40%, and 30% data. This 
resulted in 6 model datasets consisting of 224, 196, 168, 140, 
112 and 84 samples. Both sampling schemes were implemented 
within GIS, and regression analyses were done within a statisti¬ 
cal package. For each sample size, the regression modeling was 
done 10 times by random selection of modeling data resulting in 
a total of 60 models. The difference in variance explained by 
models of different sample size was used to assess the effect of 
sample size on RT model, and the difference in variance ex¬ 
plained by 10 models developed from a given sample size was 
used to assess the effect of samples on RT model. We also as¬ 
sessed spatial autocorrelation structure in biomass distribution 
using reduced sample sizes. 

We used ArcGIS 9.2 (Environmental Systems Research Insti¬ 
tute, Redlands, CA) for GIS analyses, spatial interpolation and 
map visualization, Hawths Analyses Tool (Beyer, H. L. 2004) 
for random sampling to assess the effects of sample size, and 
JPM (SAS, Cary, NC) for regression analyses. 

Results 

Biomass autocorrelation and uncertainty 

Biomass distribution in the Big Sur ranged from 0.8 to 139.6 
Mg/500 m 2 with an average density of 22.6 Mg/500 m 2 . The 
distribution was wide (coefficient of variation 80%) and posi¬ 
tively skewed with density over 50 Mg/500 m 2 in 22 plots and 
over 100 Mg/500 m 2 in only one plot. Plots with large biomass 
density resulted from a few large diameter trees rather than many 
smaller trees within a plot. Biomass differed significantly be¬ 
tween mixed evergreen and redwood-tanoak forests (F = 68.8, p 
< 0.00) with an average biomass density of 15.5 for mixed ever¬ 
green and 31.7 for redwood-tanoak forests. 

Spatial autocorrelation in biomass distribution was modeled 
with a spherical semivariogram (Fig 2) fitted using the parame¬ 
ters: nugget 0.33, sill 0.59 and range 3 185 m. The range (3,185 
m) suggests that biomass was spatially autocorrelated at sub¬ 
regional scale, and plots with separation distances greater than 
3.1 km may be considered to be spatially independent. Indicator 


semivariograms were modeled using a sum of nugget effect and 
spherical model with semivariograms parameters summarized in 
Table 2. The structure of indicator semivariograms varied along 
the transition from low to high thresholds and the nugget effect 
increased from 4% in semivariogram for the second decile to 
49% for the eighth decile. The autocorrelation range increased 
with higher threshold values and ranged from 818 to 3 650 m, 
which suggests that lower biomass areas were locally constrained 
while higher biomass areas were spatially autocorrelated across 
larger distances. 



Fig 2. Semivariogram of biomass density 


Table 2. Parameter used to model indicator semivariograms of bio¬ 
mass density 


Threshold deciles 

Nugget 

Sill 

Range 

Nugget effect (%)* 

Second 

0.007 

0.169 

818 

4 

Third 

0.012 

0.208 

1058 

6 

Fourth 

0.024 

0.234 

1545 

10 

Fifth 

0.052 

0.253 

1600 

20 

Sixth 

0.090 

0.243 

2245 

37 

Seventh 

0.089 

0.205 

2850 

43 

Eighth 

0.077 

0.158 

3650 

49 


*computed as the ratio of nugget to sill variance and expressed as % 

Biomass prediction models 

The optimal RT model for biomass prediction had eight predic¬ 
tion nodes, which were developed using six predictor variables 
(vegetation, TMI, northing, PSR Dec, elevation, and distance to 
non-forest) (Fig 3). The model explained 35% of the variation in 
biomass with a cross validation RMSE of 14.8. Vegetation type 
was identified as the first splitting variable, with higher biomass 
in redwood-tanoak forests, and the effects of other predictor 
variables were nested within the effects of vegetation type on 
biomass. Geographic differences, potential solar radiation and 
elevation were predictors of biomass variation in mixed- 
evergreen forests, while TMI, distance to non-forest and eleva¬ 
tion were predictors in redwood-tanoak forest. For both vegeta¬ 
tion types, lower biomass were found in higher elevation plots 
and vice-versa. As redwood-tanoak forests are found in lower 
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elevation areas, a smaller threshold was selected for predictive 
splitting in redwood-tanoak plots (elevation = 381 m) compared 
to mixed-evergreen plots (elevation = 767 m). Residuals from 
RT predictions were spatially autocorrelated upto 2,330 m, and 
RK model implemented within GIS produced spatially varying 


patterns of biomass distribution across the Big Sur (Fig 4). When 
model prediction quality were assessed at independent validation 
sites (N: 70), RT model explained 26% of biomass variation with 
prediction RMSE of 15.8, and RK model explained 41% of bio¬ 
mass vairaiton with prediction RMSE of 13.2. 
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Fig 3. Predictive structure of Regression Tree model for biomass prediction 
based on ecological and physiographic variables. Ovals and squares represent 
non-terminal and terminal nodes, respectively. 

The values inside the nodes are the predicted values with prediction standard 
deviation within the bracket. Prediction criteria are presented between the nodes. 
The dark solid arrows indicate the combination of environmental conditions 



Kilometers 
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Biomass 
(Mg/500 sq m) 

■ 5-10 
□ 10-15 
15-20 
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associated with best predictions, while dashed arrow indicates the combination of 
environmental conditions associated with poorest prediction. Abbreviations for 
predictor variables are - MO: mixed oak, RWTO: redwood-tanoak, PSR Dec: 
Potential solar radiation for the month of December, DistNonFor: distance to the 
non forest, TMI: topohgraphic moisture index. 


Fig 4. Biomass map derived using Regression Kriging. 
The inset shows detailed pattern of biomass spatial 
variation. 


Effects of sample size 

RT models developed with sample sizes ranging from 112 to 280 
explained 32%-45% of the variance in biomass and the reduc¬ 
tion in sample size did not have a significant effect on model fit 
(Table 3). Across all sample sizes, vegetation type and TMI were 
consistently identified as predictors of biomass. PSR was also 
selected across sample sizes except the scenario that used 112 
observations. As expected, RT models based on smaller sample 
sizes had fewer prediction nodes, which varied from 8 nodes 
with 280 samples to 4 nodes with 112 samples. For a given sam¬ 
ple size (with ten models) and across various sample sizes of 224, 
196, 168, 140, 112 and 84, RT model fit did not differ signifi¬ 
cantly (Fig 5). However, when the models were tested with an 
independent dataset, prediction quality appeared to be poorer 
than as expected with the model set. When tested for spatial 
autocorrelation, sample sizes of 112 and lower did not exhibit 
spatial autocorrelation structure. 


Table 3. Predictor variables, number of prediction nodes, and vari¬ 
ance explained by regression tree prediction model developed with 
varying sample sizes 


Sample Predictor variables 

size 

N 

nodes 

R 2 

280 

Vegetation, TMI, PSR Dec, Elevation, Northing*, 

Dist NonFor 

8 

0.33 

252 

Vegetation, TMI, PSR Dec, Elevation, Northing 

7 

0.32 

224 

Vegetation, TMI, PSR Dec, Elevation, Northing, 

Dist NonFor 

7 

0.31 

196 

Vegetation, TMI, PSR Dec, Distance to non forest 

4 

0.45 

168 

Vegetation, TMI, PSR Dec 

5 

0.32 

140 

Vegetation, TMI, PSR Dec 

4 

0.35 

112 

Vegetation, TMI 

4 

0.32 


Abbreviations - TMI: topographic moisture index, Dist NonFor: distance to 
non-forest; PSR Dec: potential solar radiation for the month of December 
* Northing represents latitude in meter 
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224 196 168 140 112 84 

Sample size 

Fig 5. Effects of sample size and samples on regression tree predic¬ 
tion performance during model development (model fit) and valida- 
tion (model validation). Size of the box indicate average R“ which are 
also plotted within the box, the error bars indicate standard deviation of 
R 2 , and the maximum R 2 are plotted above the bars 

Discussion 

We assessed spatial variation of above-ground biomass distribu¬ 
tion in mixed-evergreen and redwood-tanoak forests that extend 
across heterogeneous landscapes of the Big Sur ecoregion, and 
developed models to predict and map biomass at unsampled 
locations. Semivariograms revealed that biomass distribution 
was spatially structured and autocorrelated up to 3,185 m. 
Semivariance in the first lag, which was estimated from 25 sam¬ 
ple pairs with an average spacing of 375 m, accounted for over 
half of the sill variance. Therefore, substantial portion of biomass 
spatial variation in the Big Sur are localized, which may be at¬ 
tributed to the effects of ecological and physiographic gradients 
(Blackard et al. 2008) that constitute the landscape heterogeneity 
of the Big Sur. This corroborates with recent study (Freeman and 
Moisen 2007) which observed weak spatial autocorrelation in 
biomass distribution for some regions in western US; such weak 
autocorrelation is very likely related to the scale of analyses 
whereby biomass distribution may be autocorrelated within spa¬ 
tial scales of few kilometers (as in Big Sur) that are shorter than 
the lag spacing (5-100 km) that was used to model biomass 
semivariogram. The range of indicator semivariograms increased 
with higher threshold values (Table 2), which suggests that spa¬ 
tial clusters of lower biomass areas were locally constrained and 
those of higher biomass areas extended across larger distances. 
This confers to spatial patterns of vegetation distribution that 
result from smaller trees being clumped within finer spatial 
scales, while larger trees are spatially distributed in a fairly regu¬ 
lar pattern across larger distances so as to achieve dominance by 
avoiding direct competition for nutrients and sunlight from other 
trees of similar age (Gilbert and Lowell 1997). 

Biomass spatial variation across heterogeneous landscapes of 
the Big Sur can be explained based on ecological and physi¬ 


ographic variables identified as predictor variables in the RT 
models. The RT methodology uses a series of binary splits to 
develop prediction models that are suited for a subset of data 
(Brieman et al. 1984) and such models have shown to capture the 
spatially varying rules of landscape change (McDonald and Ur¬ 
ban 2006). Vegetation type was selected as the first predictor 
variable, with predicted biomass of 15.5 Mg/500 m 2 for mixed 
evergreen and 31.5 Mg/500 m 2 for redwood-tanoak forests (Fig 
3). Such large difference in biomass accumulation by vegetation 
community type is not surprising considering that redwoods are 
dominantly large trees in the Big Sur. The models showed posi¬ 
tive effect of TMI and PSR on biomass, which reflects the role of 
energy on plant growth and biomass accumulation (Chen et al. 
1999). The ecological effects of distance to non-forest were ob¬ 
served for plots in redwood-tanoak forests, whereby plots that 
are close to the forest edge (i.e., lower distance to non-forest) had 
higher biomass compared to plots that were towards inside the 
forests. The effects of distance to non-forest on biomass accumu¬ 
lation is mediated through their effects on forest microclimate by 
affecting incoming sunlight and temperature gradient from outer 
to inner forests (Chen et al. 1999; VanWalleghem and Meente- 
meyer 2009). Such ecologically mediated effects of forest edge 
on biomass is of growing interest considering the increasing rates 
of forest fragmentation associated with increasing deforestation 
globally (Asbjomsen et al. 2004). Surprisingly, disturbance his¬ 
tory did not affect biomass which is likely because many domi¬ 
nant trees in the Big Sur are long lived and persist throughout 
forest succession. Overall, redwood-tanoak forests that have high 
soil moisture or are in forest edges (short distance to non forest) 
are hotspots for biomass accumulation in the Big Sur, and such 
areas are of interest to enhance terrestrial carbon storage. With 
smaller sample size, fewer variables were identified as predictors 
of biomass (Table 3), which is expected in landscape and re¬ 
gional scale studies because smaller sample sizes may not be 
representative of the wide range in ecological and physiographic 
variables that influence plant growth and biomass accumulation. 
However, selection of the most important predictor variables 
(vegetation type, TMI, PSR) and the model fit were relatively 
insensitive to the sample size. Thus, biomass prediction models 
developed with small sample sizes are able to map spatial pat¬ 
terns of biomass distribution resulting from the variation in ma¬ 
jor predictor variables. The landscape context variation in predic¬ 
tor variables and their predictive relationships with biomass de¬ 
termines biomass spatial distribution across the Big Sur. 

We observed that deterministic spatial variation in biomass 
distribution within spatial scales of ~3.1 km, as observed in this 
study, could be accounted for by spatial variation in ecological 
and physiographic variables. Spatial variation in ecological and 
physiographic variables should therefore be incorporated into 
biomass mapping models. This is especially relevant for map¬ 
ping biomass distribution across heterogeneous landscapes where 
biomass spatial variation occurs within spatial scales that are too 
fine to be identified by sparsely sampled data. Biomass map 
derived using RK model (Fig 4) shows spatially heterogeneous 
pattern of biomass distribution across the Big Sur which would 
not be captured by univariate kriging model that generates 
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smooth maps. Our ability to infer and map heterogeneous spatial 
pattern across the landscape results from RT model which ex¬ 
plored the predictive relationship of biomass with ecological and 
physiographic variables. Based on independent validation, RT 
model explained 26% of biomass variation with a prediction 
RMSE of 15.8. When autocorrelated spatial variation was com¬ 
bined with predictions from environmental correlation model, 
RK model explained larger portion of the variation (41%) and 
produced more accurate predictions with a RMSE of 13.2. Re¬ 
gional scale analyses have also shown that incorporating covari¬ 
ates like vegetation, elevation difference and soil texture into 
biomass mapping model leads to improved biomass estimations 
(Sales et al. 2007). When regression residuals are spatially auto¬ 
correlated, regression based predictions combined with spatial 
prediction of regression residuals have shown to improve predic¬ 
tions (Hengl et al. 2004). Thus, mapping models developed for 
spatial prediction and mapping of biomass distribution across 
heterogeneous landscapes, such as the Big Sur, should be based 
on spatial autocorrelation and predictive association with envi¬ 
ronmental variables. 

Predictive models and their performance can be sensitive to 
sample size and information contained within the samples; as 
such, landscape and regional scale predictive models often strike 
a balance between sample size and prediction performance. We 
observed that biomass spatial autocorrelation was short ranged 
with approximately 56% of the spatial variance occurring within 
400 m. Thus, mapping biomass distribution based on univariate 
kriging approach, which relies on sample data and the model of 
spatial autocorrelation model, would require high density sam¬ 
ples to map spatially heterogeneous variation in biomass distri¬ 
bution across the Big Sur. RT models developed with sample 
size as smaller as 84 identified major predictors of biomass and 
the model fit remained similar. However, sample sizes smaller 
than 112 failed to identify spatial autocorrelation structure indi¬ 
cating the limitation of sample size for semivariogram modeling. 
A minimum of 100 sample size has been recommended to accu¬ 
rately derive a semivariogram (Webster and Oliver 1992). There¬ 
fore, samples size can be a limiting factor for spatially-explicit 
prediction of biomass distribution across heterogeneous land¬ 
scapes, such as the Big Sur, and we recommend that the sample 
size be larger than 112. 

Assessment of spatial variation in forest biomass distribution 
in heterogeneous landscapes is challenging, but is needed to 
improve regional-scale assessments of carbon dynamics. As 
environmental variables can be autocorrelated at multiple spatial 
scales, which result largely from environmental forcing or com¬ 
munity processes (Legendre 1993; Overmars et al. 2003), our 
ability to assess their spatial variation is highly influenced by the 
data density and spatial configuration of field observations. 
Across the heterogeneous landscapes of Big Sur, our relatively 
dense network of field observations made it possible to detect 
localized spatial autocorrelation in biomass distribution. There¬ 
fore, landscape- to regional-scale models of forest biomass dis¬ 
tribution should incorporate spatial autocorrelation in vegetation 
and spatial variation in ecological and physiographic factors 
using data that sufficiently captures the region’s landscape het- 
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erogeneity. 
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